Structured and Extended Named Entity Evaluation in Automatic Speech Transcriptions

نویسندگان

  • Olivier Galibert
  • Sophie Rosset
  • Cyril Grouin
  • Pierre Zweigenbaum
  • Ludovic Quintard
چکیده

The evaluation of named entity recognition (NER) methods is an active field of research. This includes the recognition of named entities in speech transcripts. Evaluating NER systems on automatic speech recognition (ASR) output whereas human reference annotation was prepared on clean manual transcripts raises difficult alignment issues. These issues are emphasized when named entities are structured, as is the case in the Quaero NER challenge organized in 2010. This paper describes the structured named entity definition used in this challenge and presents a method to transfer reference annotations to ASR output. This method was used in the Quaero 2010 evaluation of extended named entity annotation on speech transcripts, whose results are given in the paper.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

How to evaluate ASR output for named entity recognition?

The standard metric to evaluate automatic speech recognition (ASR) systems is the word error rate (WER). WER has proven very useful in stand-alone ASR systems. Nowadays, these systems are often embedded in complex natural language processing systems to perform tasks like speech translation, manmachine dialogue, or information retrieval from speech. This exacerbates the need for the speech proce...

متن کامل

Mining Broadcast News data: Robust Info Lattices

Fine-grained information extraction performance from spoken corpora is strongly correlated with the Word Error Rate (WER) of the automatic transcriptions processed. Despite the recent advances in Automatic Speech Recognition (ASR) methods, high WER transcriptions are common when dealing with unmatched conditions between the documents to process and those used to train the ASR models. Such misma...

متن کامل

A Simple Yet Effective Approach for Named Entity Recognition from Transcribed Broadcast News

Automatic speech transcriptions pose serious challenges for NLP systems due to various peculiarities in the data. In this paper, we propose a simple approach for NER on speech transcriptions which achieves good result despite the peculiarities. The novelty of our approach is that it emphasizes on the maximum exploitation of the tokens, as they are, in the data. We developed a system for partici...

متن کامل

Mining broadcast news data: robust information extraction from word lattices

Fine-grained information extraction performance from spoken corpora is strongly correlated with the Word Error Rate (WER) of the automatic transcriptions processed. Despite the recent advances in Automatic Speech Recognition (ASR) methods, high WER transcriptions are common when dealing with unmatched conditions between the documents to process and those used to train the ASR models. Such misma...

متن کامل

Incorporating Speech Recognition Confidence into Discriminative Named Entity Recognition of Speech Data

This paper proposes a named entity recognition (NER) method for speech recognition results that uses confidence on automatic speech recognition (ASR) as a feature. The ASR confidence feature indicates whether each word has been correctly recognized. The NER model is trained using ASR results with named entity (NE) labels as well as the corresponding transcriptions with NE labels. In experiments...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011